On-Demand Compute Environment

ABSTRACT

An on-demand compute environment comprises a plurality of nodes within an on-demand compute environment available for provisioning and a slave management module operating on a dedicated node within the on-demand compute environment, wherein upon instructions from a master management module at a local compute environment, the slave management module modifies at least one node of the plurality of nodes.

PRIORITY CLAIM

The present application is a continuation of U.S. patent applicationSer. No. 13/758,164, filed Feb. 4, 2013, which is a continuation of U.S.patent application Ser. No. 14/752,622, filed Apr. 1, 2010, now U.S.Pat. No. 8,370,495, issued Feb. 5, 2013, which is a continuation of U.S.patent application Ser. No. 11/276,856, filed Mar. 16, 2006, now U.S.Pat. No. 7,698,430, issued Apr. 13, 2010, which claims priority to U.S.Provisional Application No. 60/662,240 filed Mar. 15, 2005, the contentsof which are incorporated herein by reference.

RELATED APPLICATION

The present application is related to U.S. application Ser. No.11/276,852 incorporated herein by reference.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialthat is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure as it appears in the United States Patent &Trademark Office patent file or records, but otherwise reserves allcopyright rights whatsoever.

BACKGROUND OF THE INVENTION 1. Field of the Invention

The present invention relates to a resource management system and morespecifically to a system and method of providing access to on-demandcompute resources.

2. Introduction

Managers of clusters desire maximum return on investment often meaninghigh system utilization and the ability to deliver various qualities ofservice to various users and groups. A cluster is typically defined as aparallel computer that is constructed of commodity components and runsas its system software commodity software. A cluster contains nodes eachcontaining one or more processors, memory that is shared by all of theprocessors in the respective node and additional peripheral devices suchas storage disks that are connected by a network that allows data tomove between nodes. A cluster is one example of a compute environment.Other examples include a grid, which is loosely defined as a group ofclusters, and a computer farm which is another organization of computerfor processing.

Often a set of resources organized in a cluster or a grid may have jobsto be submitted to the resources that require more capability than theset of resource has available. In this regard, there is a need in theart for being able to easily, efficiently and on-demand be able toutilize new resources or different resources to handle a job. Theconcept of “on-demand” compute resources has been developing in the highperformance computing community recently. An on-demand computingenvironment enables companies to procure compute power for averagedemand and then contract remote processing power to help in peak loadsor to offload all their compute needs to a remote facility. Severalreference books having background material related to on-demandcomputing or utility computing include Mike Ault, Madhu Tumma, Oracle10g Grid & Real Application Clusters, Rampant TechPress, 2004 and GuyBunker, Darren Thomson, Delivering Utility Computing Business-driven ITOptimization, John Wiley & Sons Ltd, 2006.

In Bunker and Thompson, section 3.3 on page 32 is entitled“Connectivity: The Great Enabler” wherein they discuss how theinterconnecting of computers will dramatically increase theirusefulness. This disclosure addresses that issue. There exists in theart a need for improved solutions to enable communication andconnectivity with an on-demand high performance computing center.

SUMMARY OF THE INVENTION

Additional features and advantages of the invention will be set forth inthe description which follows, and in part will be obvious from thedescription, or may be learned by practice of the invention. Thefeatures and advantages of the invention may be realized and obtained bymeans of the instruments and combinations particularly pointed out inthe appended claims. These and other features of the present inventionwill become more fully apparent from the following description andappended claims, or may be learned by the practice of the invention asset forth herein.

Various embodiments of the invention include, but are not limited to,methods, systems, computing devices, clusters, grids andcomputer-readable media that perform the processes and steps describedherein.

An on-demand compute environment comprises a plurality of nodes withinan on-demand compute environment available for provisioning and a slavemanagement module operating on a dedicated node within the on-demandcompute environment, wherein upon instructions from a master managementmodule at a local compute environment, the slave management modulemodifies at least one node of the plurality of nodes. Methods andcomputer readable media are also disclosed for managing an on-demandcompute environment.

A benefit of the approaches disclosed herein is a reduction inunnecessary costs of building infrastructure to accommodate peak demand.Thus, customers only pay for the extra processing power they need onlyduring those times when they need it.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and otheradvantages and features of the invention can be obtained, a moreparticular description of the invention briefly described above will berendered by reference to specific embodiments thereof which areillustrated in the appended documents and drawings. Understanding thatthese drawings depict only typical embodiments of the invention and arenot therefore to be considered to be limiting of its scope, theinvention will be described and explained with additional specificityand detail through the use of the accompanying drawings.

FIG. 1 illustrates the basic arrangement of the present disclosure;

FIG. 2 illustrates basic hardware components;

FIG. 3 illustrates a method aspect of the disclosure;

FIG. 4 illustrates a method aspect of the disclosure;

FIG. 5 illustrates another method aspect of the disclosure; and

FIG. 6 illustrates another method aspect of the disclosure.

DETAILED DESCRIPTION OF THE INVENTION

Various embodiments are discussed in detail below. While specificimplementations are discussed, it should be understood that this is donefor illustration purposes only. A person skilled in the relevant artwill recognize that other components and configurations may be usedwithout parting from the spirit and scope of the invention.

This disclosure relates to the access and management of on-demand orutility computing resources at a hosting center. FIG. 1 illustrates thebasic arrangement and interaction between a local compute environment104 and an on-demand hosting center 102. The local compute environmentmay comprise a cluster, a grid, or any other variation on these types ofmultiple node and commonly managed environments. The on-demand hostingcenter or on-demand computing environment 102 comprises a plurality ofnodes that are available for provisioning and preferably has a dedicatednode containing a hosting master 128 which may comprise a slavemanagement module 106 and/or at least one other module such as theentity manager 128 and node provisioner 118.

Products such as Moab provide an essential service for optimization of alocal compute environment. It provides an analysis into how & when localresources, such as software and hardware devices, are being used for thepurposes of charge-back, planning,

auditing, troubleshooting and reporting internally or externally. Suchoptimization enables the local environment to be tuned to get the mostout of the resources in the local compute environment. However, thereare times where more resources are needed.

Typically a hosting center 102 will have the following attributes. Itallows an organization to provide resources or services to customerswhere the resources or services are custom-tailored to the needs of thecustomer. Supporting true utility computing usually requires creating ahosting center 102 with one or more capabilities as follows: secureremote access, guaranteed resource availability at a fixed time orseries of times, integrated auditing/accounting/billing services, tieredservice level (QoS/SLA) based resource access, dynamic compute nodeprovisioning, full environment management over compute, network,storage, and application/service based resources, intelligent workloadoptimization, high availability, failure recovery, and automatedre-allocation.

A management module 108 such as, by way of example, Moab™ (which mayalso refer to any Moab product such as the Moab Workload Manager®, MoabGrid Monitor®, etc. from Cluster Resources, Inc.) enables utilitycomputing by allowing compute resources to be reserved, allocated, anddynamically provisioned to meet the needs of internal or externalworkload. Thus, at peak workload times, the local compute environmentdoes not need to be built out with peak usage in mind. As periodic peakresources are required, triggers can cause overflow to the on-demandenvironment and thus save money for the customer. The module 108 is ableto respond to either manual or automatically generated requests and canguarantee resource availability subject to existing service levelagreement (SLA) or quality of service (QOS) based arrangements. As anexample, FIG. 1 shows a user submitting a job or a query 110 to thecluster or local environment 104. The local environment will typicallybe a cluster or a grid with local workload. Jobs may be submitted whichhave explicit resource requirements. The local environment 104 will havevarious attributes such as operating systems, architecture, networktypes, applications, software, bandwidth capabilities, etc, which areexpected by the job implicitly. In other words, jobs will typicallyexpect that the local environment will have certain attributes that willenable it to consume resources in an expected way.

Other software is shown by way of example in a distributed resourcemanager such as Torque 128 and various nodes 130, 132 and 134. Themanagement modules (both master and/or slave) may interact and operatewith any resource manager, such as Torque, LSF, SGE, PBS and LoadLevelerand are agnostic in this regard. Those of skill in the art willrecognize these different distributed resource manager softwarepackages.

A hosting master or hosting management module 106 may also be aninstance of a Moab software product with hosting center capabilities toenable an organization to dynamically control network, compute,application, and storage resources and to dynamically provisionoperating systems, security, credentials, and other aspects of acomplete end-to-end compute environments. Module 106 is responsible forknowing all the policies, guarantees, promises and also for managing theprovisioning of resources within the utility computing space 102. In onesense, module 106 may be referred to as the “master” module in that itcouples and needs to know all of the information associated with boththe utility environment and the local environment. However, in anothersense it may be referred to as the slave module or provisioning brokerwherein it takes instructions from the customer management module 108for provisioning resources and builds whatever environment is requestedin the on-demand center 102. A slave module would have none of its ownlocal policies but rather follows all requests from another managementmodule. For example, when module 106 is the slave module, then a mastermodule 108 would submit automated or manual (via an administrator)requests that the slave module 106 simply follows to manage the buildout of the requested environment. Thus, for both IT and end users, asingle easily usable interface can increase efficiency, reduce costsincluding management costs and improve investments in the local customerenvironment. The interface to the local environment which also has theaccess to the on-demand environment may be a web-interface or accessportal as well. Restrictions of feasibility only may exist. The customermodule 108 would have rights and ownership of all resources. Theallocated resources would not be shared but be dedicated to therequestor. As the slave module 106 follows all directions from themaster module 108, any policy restrictions will preferably occur on themaster module 108 in the local environment.

The modules also provide data management services that simplify addingresources from across a local environment. For example, if the localenvironment comprises a wide area network, the management module 108provides a security model that ensures, when the environment dictates,that administrators can rely on the system even when untrusted resourcesat the certain level have been added to the local environment or theon-demand environment. In addition, the management modules comply withn-tier web services based architectures and therefore scalability andreporting are inherent parts of the system. A system operating accordingto the principles set forth herein also has the ability to track, recordand archive information about jobs or other processes that have been runon the system.

A hosting center 102 provides scheduled dedicated resources to customersfor various purposes and typically has a number of key attributes:secure remote access, guaranteed resource availability at a fixed timeor series of times, tightly integrated auditing/accounting services,varying quality of service levels providing privileged access to a setof users, node image management allowing the hosting center to restorean exact customer-specific image before enabling access. Resourcesavailable to a module 106, which may also be referred to as a providerresource broker, will have both rigid (architecture, RAM, local diskspace, etc.) and flexible (OS, queues, installed applications etc.)attributes. The provider or on-demand resource broker 106 can typicallyprovision (dynamically modify) flexible attributes but not rigidattributes. The provider broker 106 may possess multiple resources eachwith different types with rigid attributes (i.e., single processor anddual processor nodes, Intel nodes, AMD nodes, nodes with 512 MB RAM,nodes with 1 GB RAM, etc).

This combination of attributes presents unique constraints on amanagement system. We describe herein how the management modules 108 and106 are able to effectively manage, modify and provision resources inthis environment and provide full array of services on top of theseresources.

Utility-based computing technology allows a hosting center 102 toquickly harness existing compute resources, dynamically co-allocate theresources, and automatically provision them into a seamless virtualcluster. The management modules' advanced reservation and policymanagement tools provide support for the establishment of extensiveservice level agreements, automated billing, and instant chart andreport creation.

Also shown in FIG. 1 are several other components such as an identitymanager 112 and a node provisioner 118 as part of the hosting center102. The hosting master' 128 may include an identity manager interface112 that may coordinate global and local information regarding users,groups, accounts, and classes associated with compute resources. Theidentity manager interface 112 may also allow the management module 106to automatically and dynamically create and modify user accounts andcredential attributes according to current workload needs. The hostingmaster 128 allows sites extensive flexibility when it comes to definingcredential access, attributes, and relationships. In most cases, use ofthe USERCFG, GROUPCFG, ACCOUNTCFG, CLASSCFG, and QOSCFG parameters isadequate to specify the needed configuration. However, in certain cases,such as the following, this approach may not be ideal or even adequate:environments with very large user sets; environments with very dynamiccredential configurations in terms of fairshare targets, priorities,service access constraints, and credential relationships; gridenvironments with external credential mapping information services;enterprise environments with fairness policies based on multi-clusterusage.

The modules address these and similar issues through the use of theidentity manager 112. The identity manager 112 allows the module toexchange information with an external identity management service. Aswith the module's resource manager interfaces, this service can be afull commercial package designed for this purpose, or something farsimpler by which the module obtains the needed information for a webservice, text file, or database.

Next attention is turned to the node provisioner 118 and as an exampleof its operation, the node provisioner 118 can enable the allocation ofresources in the hosting center 102 for workload from a local computeenvironment 104. The customer management module 108 will communicatewith the hosting management module 106 to begin the provisioningprocess. In one aspect, the provisioning module 118 may generate anotherinstance of necessary management software 120 and 122 which will becreated in the hosting center environment as well as compute nodes 124and 126 to be consumed by a submitted job. The new management module 120is created on the fly, may be associated with a specific request andwill preferably be operative on a dedicated node. If the new managementmodule 120 is associated with a specific request or job, as the jobconsumes the resources associated with the provisioned compute nodes124, 126, and the job becomes complete, then the system would remove themanagement module 120 since it was only created for the specificrequest. The new management module 120 may connect to other modules suchas module 108. The module 120 does not necessarily have to be createdbut may be generated on the fly as necessary to assist in communicationand provisioning and use of the resources in the utility environment102. For example, the module 106 may go ahead and allocate nodes withinthe utility computing environment 102 and connect these nodes directlyto module 108 but in that case you may lose some batch ability as atradeoff. The hosting master 128 having the management module 106,identity manager 112 and node provisioner 118 preferably is co-locatedwith the utility computing environment but may be distributed. Themanagement module on the local environment 108 may then communicatedirectly with the created management module 120 in the hosting center tomanage the transfer of workload and consumption of on-demand centerresources.

FIG. 6 provides an illustration of a method aspect of utilizing the newmanagement module. As shown, this method comprises receiving aninstruction at a slave management module associated with an on-demandcomputing environment from a master management module associated with alocal computing environment (602) and based on the instruction, creatinga new management module on a node in the on-demand computing environmentand provisioning at least one compute node in the on-demand computingenvironment, wherein the new management module manages the at least onecompute node and communicates with the master management module (604).

There are two supported primary usage models, a manual and an automaticmodel. In manual mode, utilizing the hosted resources can be as easy asgoing to a web site, specifying what is needed, selecting one of theavailable options, and logging in when the virtual cluster is activated.In automatic mode, it is even simpler. To utilize hosted resources, theuser simply submits jobs to the local cluster. When the local clustercan no longer provide an adequate level of service, it automaticallycontacts the utility hosting center, allocates additional nodes, andruns the jobs. The end user is never aware that the hosting center evenexists. He merely notices that the cluster is now bigger and that hisjobs are being run more quickly.

When a request for additional resources is made from the localenvironment, either automatically or manually, a client module or clientresource broker (which may be, for example, an instance of a managementmodule 108 or 120) will contact the provider resource broker 106 torequest resources. It will send information regarding rigid attributesof needed resources as well as quantity or resources needed, requestduration, and request timeframe (i.e., start time, feasible times ofday, etc.) It will also send flexible attributes which must beprovisioned on the nodes 124, 126. Both flexible and rigid resourceattributes can come from explicit workload-specified requirement or fromimplicit requirements associated with the local or default computeresources. The provider resource broker 106 must indicate if it ispossible to locate requested resources within the specified timeframefor sufficient duration and of the sufficient quantity. This taskincludes matching rigid resource attributes and identifying one or moreprovisioning steps required to put in place all flexible attributes.

When provider resources are identified and selected, the client resourcebroker 108 or 120 is responsible for seamlessly integrating theseresources in with other local resources. This includes reportingresource quantity, state, configuration and load. This further includesautomatically enabling a trusted connection to the allocated resourceswhich can perform last mile customization, data staging, and jobstaging. Commands are provided to create this connection to the providerresource broker 106, query available resources, allocate new resources,expand existing allocations, reduce existing allocations, and releaseall allocated resources.

In most cases, the end goal of a hosting center 102 is to make availableto a customer, a complete, secure, packaged environment which allowsthem to accomplish one or more specific tasks. This packaged environmentmay be called a virtual cluster and may consist of the compute, network,data, software, and other resources required by the customer. Forsuccessful operation, these resources must be brought together andprovisioned or configured so as to provide a seamless environment whichallows the customers to quickly and easily accomplish their desiredtasks.

Another aspect of the invention is the cluster interface. The desiredoperational model for many environments is providing the customer with afully automated self-service web interface. Once a customer hasregistered with the host company, access to a hosting center portal isenabled. Through this interface, customers describe their workloadrequirements, time constraints, and other key pieces of information. Theinterface communicates with the backend services to determine when,where, and how the needed virtual cluster can be created and reportsback a number of options to the user. The user selects the desiredoption and can monitor the status of that virtual cluster via web andemail updates. When the virtual cluster is ready, web and emailnotification is provided including access information. The customer logsin and begins working.

The hosting center 102 will have related policies and service levelagreements. Enabling access in a first come—first served model providesreal benefits but in many cases, customers require reliable resourceaccess with guaranteed responsiveness. These requirements may be anyperformance, resource or time based rule such as in the followingexamples: I need my virtual cluster within 24 hours of asking; I want avirtual cluster available from 2 to 4 PM every Monday, Wednesday, andFriday; I want to always have a virtual cluster available andautomatically grow/shrink it based on current load, etc.

Quality of service or service level agreement policies allow customersto convert the virtual cluster resources to a strategic part of theirbusiness operations greatly increasing the value of these resources.Behind the scenes, a hosting center 102 consists of resource managers,reservations, triggers, and policies. Once configured, administration ofsuch a system involves addressing reported resource failures (i.e., diskfailures, network outages, etc) and monitoring delivered performance todetermine if customer satisfaction requires tuning policies or addingresources.

The modules associated with the local environment 104 and the hostingcenter environment 102 may be referred to as a master module 108 and aslave module 106. This terminology relates to the functionality whereinthe hosting center 102 receives requests for workload and provisioningof resources from the module 108 and essentially follows those requests.In this regard, the module 108 may be referred to as a client resourcebroker 108 which will contact a provider resource broker 106 (such as anOn-Demand version of Moab).

The management module 108 may also be, by way of example, a MoabWorkload Manager® operating in a master mode. The management module 108communicates with the compute environment to identify resources, reserveresources for consumption by jobs, provision resources and in generalmanage the utilization of all compute resources within a computeenvironment. As can be appreciated by one of skill in the art, thesemodules may be programmed in any programming language, such as C or C++and which language is immaterial to the invention.

In a typical operation, a user or a group submits a job to a localcompute environment 104 via an interface to the management module 108.An example of a job is a submission of a computer program that willperform a weather analysis for a television station that requires theconsumption of a large amount of compute resources. The module 108and/or an optional scheduler 128 such as TORQUE, as those of skill inthe art understand, manages the reservation of resources and theconsumption of resources within the environment 104 in an efficientmanner that complies with policies and restrictions. The use of aresource manager like TORQUE 128 is optional and not specificallyrequired as part of the disclosure.

A user or a group of users will typically enter into a service levelagreement (SLA) which will define the policies and guarantees forresources on the local environment 104. For example, the SLA may providethat the user is guaranteed 10 processors and 50 GB of hard drive spacewithin 5 hours of a submission of a job request. Associated with anyuser may be many parameters related to permissions, guarantees, prioritylevel, time frames, expansion factors, and so forth. The expansionfactor is a measure of how long the job is taking to run on a localenvironment while sharing the environment with other jobs versus howlong it would take if the cluster was dedicated to the job only. Ittherefore relates to the impact of other jobs on the performance of theparticular job. Once a job is submitted and will sit in a job queuewaiting to be inserted into the cluster 104 to consume those resources.The management software will continuously analyze the environment 104and make reservations of resources to seek to optimize the consumptionof resources within the environment 104. The optimization process musttake into account all the SLA's of users, other policies of theenvironment 104 and other factors.

As introduced above, this disclosure provides improvements in theconnectivity between a local environment 104 and an on-demand center102. The challenges that exist in accomplishing such a connectioninclude managing all of the capabilities of the various environments,their various policies, current workload, workload queued up in the jobqueues and so forth.

As a general statement, disclosed herein is a method and system forcustomizing an on-demand compute environment based on both implicit andexplicit job or request requirements. For example, explicit requirementsmay be requirements specified with a job such as a specific number ofnodes or processor and a specific amount of memory. Many otherattributes or requirements may be explicitly set forth with a jobsubmission such as requirements set forth in an SLA for that user.Implicit requirements may relate to attributes of the computeenvironment that the job is expecting because of where it is submitted.For example, the local compute environment 104 may have particularattributes, such as, for example, a certain bandwidth for transmission,memory, software licenses, processors and processor speeds, hard drivememory space, and so forth. Any parameter that may be an attribute ofthe local environment in which the job is submitted may relate to animplicit requirement. As a local environment 104 communicates with anon-demand environment 102 for the transfer of workload, the implicit andexplicit requirements are seamlessly imported into the on-demandenvironment 102 such that the user's job can efficiently consumeresources in the on-demand environment 102 because of the customizationof that environment for the job. This seamless communication occursbetween a master module 108 and a slave module 106 in the respectiveenvironments. As shown in FIG. 1, a new management module 120 may alsobe created for a specific process or job and also communicate with amaster module 108 to manage the provisioning, consumption and clean upof compute nodes 124, 126 in the on-demand environment 102.

Part of the seamless communication process includes the analysis andprovisioning of resources taking into account the need to identifyresources such as hard drive space and bandwidth capabilities toactually perform the transfer of the workload. For example, if it isdetermined that a job in the queue has a SLA that guarantees resourceswithin 5 hours of the request, and based on the analysis by themanagement module of the local environment the resources cannot beavailable for 8 hours, and if such a scenario is at triggering event,then the automatic and seamless connectivity with the on-demand center102 will include an analysis of how long it will take to provision anenvironment in the on-demand center that matches or is appropriate forthe job to run. That process, of provisioning the environment in theon-demand center 102, and transferring workload from the localenvironment 104 to the on-demand center 102, may take, for example, 1hour. In that case, the on-demand center will begin the provisioningprocess one hour before the 5 hour required time such that theprovisioning of the environment and transfer of data can occur to meetthe SLA for that user. This provisioning process may involve reservingresources within the on-demand center 102 from the master module 108 aswill be discussed more below.

FIG. 3 illustrates an embodiment in this regard, wherein a methodcomprises detecting an event in a local compute environment (302). Theevent may be a resource need event such as a current resource need or apredicted resource need. Based on the detected event, a moduleautomatically establishes communication with an on-demand computeenvironment (304). This may also involve dynamically negotiating andestablishing a grid/peer relationship based on the resource need event.A module provisions resources within the on-demand compute environment(306) and workload is transferred from the local-environmenttransparently to the on-demand compute environment (308). Preferably,local information is imported to the on-demand environment and on-demandinformation is communicated to the local compute environment, althoughonly local environment information may be needed to be transmitted tothe on-demand environment. Typically, at least local environmentinformation is communicated and also job information may be communicatedto the on-demand environment. Examples of local environment informationmay be at least one of class information, configuration policyinformation and other information. Information from the on-demand centermay relate to at least one of resources, availability of resources, timeframes associated with resources and any other kind of data that informsthe local environment of the opportunity and availability of theon-demand resources. The communication and management of the databetween the master module or client module in the local environment andthe slave module is preferably transparent and unknown to the user whosubmitted the workload to the local environment. However, one aspect mayprovide for notice to the user of the tapping into the on-demandresources and the progress and availability of those resources.

Example triggering events may be related to at least one of a resourcethreshold, a service threshold, workload and a policy threshold or otherfactors. Furthermore, the event may be based one of all workloadassociated with the local compute environment or a subset of workloadassociated with the compute environment or any other subset of a givenparameter or may be external to the compute environment such as anatural disaster or power outage or predicted event.

The disclosure below provides for various aspects of this connectivityprocess between a local environment 104 and an on-demand center 102. TheCD submitted with the priority Provisional patent application includessource code that carries out this functionality. The various aspectswill include an automatic triggering approach to transfer workload fromthe local environment 104 to the on-demand center 102, a manual“one-click” method of integrating the on-demand compute environment 102with the local environment 104 and a concept related to reservingresources in the on-demand compute environment 102 from the localcompute environment 104.

The first aspect relates to enabling the automatic detection of atriggering event such as passing a resource threshold or servicethreshold within the compute environment 104. This process may bedynamic and involve identifying resources in a hosting center,allocating resources and releasing them after consumption. Theseprocesses may be automated based on a number of factors, such as:workload and credential performance thresholds; a job's current timewaiting in the queue for execution, (queuetime) (i.e., allocate if a jobhas waited more than 20 minutes to receive resources); a job's currentexpansion factor which relates to a comparison of the affect of otherjobs consuming local resources has on the particular job in comparisonto a value if the job was the only job consuming resources in the localenvironment; a job's current execution load (i.e., allocate if load onjob's allocated resources exceeds 0.9); quantity of backlog workload(i.e., allocate if more than 50,000 proc-hours of workload exist); ajob's average response time in handling transactions (i.e., allocate ifjob reports it is taking more than 0.5 seconds to process transaction);a number of failures workload has experienced (i.e., allocate if a jobcannot start after 10 attempts); overall system utilization (i.e.,allocate if more than 80% of machine is utilized) and so forth. This isan example list and those of skill in the art will recognize otherfactors that may be identified as triggering events.

Other triggering events or thresholds may comprise a predicted workloadperformance threshold. This would relate to the same listing of eventsabove but be applied in the context of predictions made by a managementmodule or customer resource broker.

Another listing of example events that may trigger communication withthe hosting center include, but are not limited to events such asresource failures including compute nodes, network, storage, license(i.e., including expired licenses); service failures including DNS,information services, web services, database services, securityservices; external event detected (i.e., power outage or nationalemergency reported) and so forth. These triggering events or thresholdsmay be applied to allocate initial resources, expand allocatedresources, reduce allocated resources and release all allocatedresources. Thus, while the primary discussion herein relates to aninitial allocation of resources, these triggering events may cause anynumber of resource-related actions. Events and thresholds may also beassociated with any subset of jobs or nodes (i.e., allocate only ifthreshold backlog is exceeded on high priority jobs only or jobs from acertain user or project or allocate resources only if certain servicenodes fail or certain licenses become unavailable.)

For example, if a threshold of 95% of processor consumption is met by951 processors out of the 1000 processors in the environment are beingutilized, then the system (which may or may not include the managementmodule 108) automatically establishes a connection with the on-demandenvironment 102. Another type of threshold may also trigger theautomatic connection such as a service level received threshold, aservice level predicted threshold, a policy-based threshold, a thresholdor event associated with environment changes such as a resource failure(compute node, network storage device, or service failures).

In a service level threshold, one example is where a SLA specifies acertain service level requirement for a customer, such as resourcesavailable within 5 hours. If an actual threshold is not met, i.e., a jobhas waited now for 5 hours without being able to consume resource, orwhere a threshold is predicted to not be met, these can be triggeringevents for communication with the on-demand center. The module 108 thencommunicates with the slave manager 106 to provision or customize theon-demand resources 102. The two environments exchange the necessaryinformation to create reservations of resources, provision, handlelicensing, and so forth, necessary to enable the automatic transfer ofjobs or other workload from the local environment 104 to the on-demandenvironment 102. For a particular task or job, all or part of theworkload may be transferred to the on-demand center. Nothing about auser job 110 submitted to a management module 108 changes. The on-demandenvironment 102 then instantly begins running the job without any changein the job or perhaps even any knowledge of the submitter.

There are several aspects of the disclosure that are shown in the sourcecode on the CD. One is the ability to exchange information. For example,for the automatic transfer of workload to the on-demand center, thesystem will import remote classes, configuration policy information andother information from the local scheduler 108 to the slave scheduler106 for use by the on-demand environment 102. Information regarding theon-demand compute environment, resources, policies and so forth are alsocommunicated from the slave module 106 to the local module 108.

The triggering event for the automatic establishment of communicationwith the on-demand center and a transfer of workload to the on-demandcenter may be a threshold that has been passed or an event thatoccurred. Threshold values may comprise an achieved service level,predicted service level and so forth. For example, a job sitting in aqueue for a certain amount of time may trigger a process to contact theon-demand center and transfer that job to the on-demand center to run.If a queue has a certain number of jobs that have not been submitted tothe compute environment for processing, if a job has an expansion factorthat has a certain value, if a job has failed to start on a localcluster one or more times for whatever reason, then these types ofevents may trigger communication with the on-demand center. These havebeen examples of threshold values that when passed will triggercommunication with the on-demand environment.

Example events that also may trigger the communication with theon-demand environment include, but are not limited to, events such asthe failure of nodes within the environment, storage failure, servicefailure, license expiration, management software failure, resourcemanager fails, etc. In other words, any event that may be related to anyresource or the management of any resource in the compute environmentmay be a qualifying event that may trigger workload transfer to anon-demand center. In the license expiration context, if the license in alocal environment of a certain software package is going to expire suchthat a job cannot properly consume resources and utilize the softwarepackage, the master module 108 can communicate with the slave module 106to determine if the on-demand center has the requisite license for thatsoftware. If so, then the provisioning of the resources in the on-demandcenter can be negotiated and the workload transferred wherein it canconsume resources under an appropriate legal and licensed framework.

The basis for the threshold or the event that triggers thecommunication, provisioning and transfer of workload to the on-demandcenter may be all jobs/workload associated with the local computeenvironment or a subset of jobs/workload associated with the localcompute environment. In other words, the analysis of when an eventand/or threshold should trigger the transfer of workload may be based ona subset of jobs. For example, the analysis may be based on all jobssubmitted from a particular person or group or may be based on a certaintype of job, such as the subset of jobs that will require more than 5hours of processing time to run. Any parameter may be defined for thesubset of jobs used to base the triggering event.

The interaction and communication between the local compute environmentand the on-demand compute environment enables an improved process fordynamically growing and shirking provisioned resource space based onload. This load balancing between the on-demand center and the localenvironment may be based on thresholds, events, all workload associatedwith the local environment or a subset of the local environmentworkload.

Another aspect of the disclosure is the ability to automate datamanagement between two sites. This involves the transparent handling ofdata management between the on-demand environment 102 and the localenvironment 104 that is transparent to the user. Typically environmentalinformation will always be communicated between the local environment104 and the on-demand environment 102. In some cases, job informationmay not need to be communicated because a job may be gathering its owninformation, say from the Internet, or for other reasons. Therefore, inpreparing to provision resources in the on-demand environment allinformation or a subset of information is communicated to enable theprocess. Yet another aspect of the invention relates to a simple andeasy mechanism to enable on-demand center integration. This aspect ofthe invention involves the ability of the user or an administrator to,in a single action like the click of a button or a one-click action, beable to command the integration of an on-demand center information andcapability into the local resource manager 108.

This feature is illustrated in FIG. 4. A module, preferably associatedwith the local compute environment, receives a request from anadministrator to integrate an on-demand compute environment into thelocal compute environment (402). The creation of a reservation or of aprovisioning of resources in the on-demand environment may be from arequest from an administrator or local or remote automated broker. Inthis regard, the various modules will automatically integrate localcompute environment information with on-demand compute environmentinformation to make available resources from the on-demand computeenvironment to requestors of resources in the local compute environment(404). Integration of the on-demand compute environment may provide forintegrating: resource configuration, state information, resourceutilization reporting, job submission information, job managementinformation resource management, policy controls including priority,resource ownership, queue configuration, job accounting and tracking andresource accounting and tracking. Thus, the detailed analysis andtracking of jobs and resources may be communicated back from theon-demand center to the local compute environment interface.Furthermore, this integration process may also include a step ofautomatically creating at least one of a data migration interface and ajob migration interface.

Another aspect provides for a method of integrating an on-demand computeenvironment into a local compute environment. The method comprisesreceiving a request from an administrator or via an automated commandfrom an event trigger or administrator action to integrate an on-demandcompute environment into a local compute environment. In response to therequest, local workload information and/or resource configurationinformation is routed to an on-demand center and an environment iscreated and customized in the on-demand center that is compatible withworkload requirements submitted to the local compute environment.Billing and costing are also automatically integrated and handled.

The exchange and integration of all the necessary information andresource knowledge may be performed in a single action or click tobroaden the set of resources that may be available to users who haveaccess initially only to the local compute environment 104. The systemmay receive the request to integrate an on-demand compute environmentinto a local compute environment in other manners as well, such as anytype of multi-modal request, voice request, graffiti on atouch-sensitive screen request, motion detection, and so forth. Thus theone-click action may be a single tap on a touch sensitive display or asingle voice command such as “integrate” or another command ormulti-modal input that is simple and singular in nature. In response tothe request, the system automatically integrates the local computeenvironment information with the on-demand compute environmentinformation to enable resources from the on-demand compute environmentavailable to requestors of resources in the local compute environment.

The one-click approach relates to the automated approach expect a humanis in the middle of the process. For example, if a threshold or atriggering event is passed, an email or a notice may be sent to anadministrator with options to allocate resources from the on-demandcenter. The administrator may be presented with one or more optionsrelated to different types of allocations that are available in theon-demand center—and via one-click or one action the administrator mayselect the appropriate action. For example, three options may include500 processors in 1 hour; 700 processors in 2 hours; and 1000 processorsin 10 hours. The options may be intelligent in that they may take intoaccount the particular triggering event, costs of utilizing theon-demand environment, SLAs, policies, and any other parameters topresent options that comply with policies and available resources. Theadministrator may be given a recommended selection based on SLAs, cost,or any other parameters discussed herein but may then choose theparticular allocation package for the on-demand center. Theadministrator also may have an option, without an alert, to viewpossible allocation packages in the on-demand center if theadministrator knows of an upcoming event that is not capable of beingdetected by the modules, such as a meeting with a group wherein theydecide to submit a large job the next day which will clearly requireon-demand resources. The one-click approach encapsulates the commandline instruction to proceed with the allocation of on-demand resources.

One of the aspects of the disclosure is the integration of an on-demandenvironment 102 and a local compute environment 104 is that the overalldata appears locally. In other words, the local scheduler 108 will haveaccess to the resources and knowledge of the on-demand environment 102but those resources, with the appropriate adherence to local policyrequirements, is handled locally and appears locally to users andadministrators of the local environment 104.

Another aspect of the invention that is enabled with the attached sourcecode is the ability to specify configuration information and feeding itdown the line. For example, the interaction between the computeenvironments supports static reservations. A static reservation is areservation that a user or an administrator cannot change, remove ordestroy. It is a reservation that is associated with the resourcemanager 108 itself. A static reservation blocks out time frames whenresources are not available for other uses. For example, if to enable acompute environment to have workload run on (or consume) resources, ajob takes an hour to provision a resources, then the module 108 may makea static reservation of resources for the provisioning process. Themodule 108 will locally create a static reservation for the provisioningcomponent of running the job. The module 108 will report on theseconstraints associated with the created static reservation within theon-demand compute environment.

Then, the module 108 will communicate with the slave module 106 ifon-demand resources are needed to run a job. The module 108 communicateswith the slave module 106 and identifies what resources are needed (20processors and 512 MB of memory, for example) and inquires when canthose resources be available. Assume that module 106 responds that theprocessors and memory will be available in one hour and that the module108 can have those resources for 36 hours. Once all the appropriateinformation has been communicated between the modules 106 and 108, thenmodule 108 creates a static reservation to block the first part of theresources which requires the one hour of provisioning. The module 108may also block out the resources with a static reservation from hour 36to infinity until the resources go away. Therefore, from zero to onehour is blocked out by a static reservation and from the end of the 36hours to infinity is blocked out. In this way, the scheduler 108 canoptimize the on-demand resources and insure that they are available forlocal workloads. The communication between the modules 106 and 108 isperformed preferably via tunneling.

Another aspect relates to receiving requests or information associatedwith resources in an on-demand center. An example will illustrate.Assume that a company has a reservation of resources within an on-demandcenter but then finds out that their budget is cut for the year. Thereis a mechanism for an administrator to enter information such as arequest for a cancellation of a reservation so that they do not have topay for the consumption of those resources. Any type of modification ofthe on-demand resources may be contemplated here. This process involvestranslating a current or future state of the environment for arequirement of the modification of usable resources. Another exampleincludes where a group determines that they will run a large job overthe weekend that will knowingly need more than the local environment. Anadministrator can submit in the local resource broker 108 a submissionof information associated with a parameter—such as a request forresources and the local broker 108 will communicate with the hostingcenter 106 and the necessary resources can be reserved in the on-demandcenter even before the job is submitted to the local environment.

The modification of resources within the on-demand center may be anincrease, decrease, or cancellation of resources or reservations forresources. The parameters may be a direct request for resources or amodification of resources or may be a change in an SLA which then maytrigger other modifications. For example, if an SLA prevented a userfrom obtaining more than 500 nodes in an on-demand center and a currentreservation has maximized this request, a change in the SLA agreementthat extended this parameter may automatically cause the module 106 toincrease the reservation of nodes according to the modified SLA.Changing policies in this manner may or may not affect the resources inthe on-demand center.

FIG. 5 illustrates a method embodiment related to modifying resources inthe on-demand compute environment. The method comprises receivinginformation at a local resource broker that is associated with resourceswithin an on-demand compute environment (502). Based on the information,the method comprises communicating instructions from the local resourcebroker to the on-demand compute environment (504) and modifyingresources associated with the on-demand compute environment based on theinstructions (506). As mentioned above, examples of the type ofinformation that may be received include information associated with arequest for a new reservation, a cancellation of an existingreservation, or a modification of a reservation such as expanding orcontracting the reserved resources in the on-demand compute environment.Other examples include a revised policy or revision to an SLA thatalters (increases or perhaps decreases) allowed resources that may bereserved in the on-demand center. The master module 108 will thenprovide instructions to the slave module 106 to create or modifyreservations in the on-demand computing environment or to make someother alteration to the resources as instructed.

Receiving resource requirement information may be based on userspecification, current or predicted workload. The specification ofresources may be fully explicit, or may be partially or fully implicitbased on workload or based on virtual private cluster (VPC) packageconcept where VPC package can include aspects of allocated orprovisioning support environment and adjustments to resource requesttimeframes including pre-allocation, allocation duration, andpost-allocation timeframe adjustments. The Application incorporatedabove provides information associated with the VPC that may be utilizedin many respects in this invention. The reserved resources may beassociated with provisioning or customizing the delivered computeenvironment. A reservation may involve the co-allocation of resourcesincluding any combination of compute, network, storage, license, orservice resources (i.e., parallel database services, security services,provisioning services) as part of a reservation across multipledifferent resource types. Also, the co-allocation of resources overdisjoint timeframes to improve availability and utilization of resourcesmay be part of a reservation or a modification of resources. Resourcesmay also be reserved with automated failure handling and resourcerecovery.

Another feature associated with reservations of resources within theon-demand environment is the use of provisioning padding. This is analternate approach to the static reservation discussed above. Forexample, if a reservation of resources would require 2 hours ofprocessing time for 5 nodes, then that reservation may be created in theon-demand center as directed by the client resource broker 108. As partof that same reservation or as part of a separate process, thereservation may be modified or adjusted to increase its duration toaccommodate for provisioning overhead and clean up processes. Therefore,there may need to be ½ hour of time in advance of the beginning of thetwo hour block wherein data transmission, operating system set up, orany other provisioning step needs to occur. Similarly, at the end of thetwo hours, there may need to be 15 minutes to clean up the nodes andtransmit processed data to storage or back to the local computeenvironment. Thus, an adjustment of the reservation may occur to accountfor this provisioning in the on-demand environment. This may or may notoccur automatically, for example, the user may request resources for 2hours and the system may automatically analyze the job submitted orutilize other information to automatically adjust the reservation forthe provisioning needs. The administrator may also understand theprovisioning needs and specifically request a reservation withprovisioning pads on one or both ends of the reservation.

A job may also be broken into component parts and only one aspect of thejob transferred to an on-demand center for processing. In that case, themodules will work together to enable co-allocation of resources acrosslocal resources and on-demand resources. For example, memory andprocessors may be allocated in the local environment while disk space isallocated in the on-demand center. In this regard, the local managementmodule could request the particular resources needed for theco-allocation from the on-demand center and when the job is submittedfor processing that portion of the job would consume on-demand centerresources while the remaining portion of the job consumes localresources. This also may be a manual or automated process to handle theco-allocation of resources.

Another aspect relates to interaction between the master managementmodule 106 and the slave management module 106. Assume a scenario wherethe local compute environment requests immediate resources from theon-demand center. Via the communication between the local and theon-demand environments, the on-demand environment notifies the localenvironment that resources are not available for eight hours butprovides the information about those resources in the eight hours. Atthe local environment, the management module 108 may instruct theon-demand management module 106 to establish a reservation for thoseresources as soon as possible (in eight hours) including, perhaps,provisioning padding for overhead. Thus, although the local environmentrequested immediate resources from the on-demand center, the best thatcould be done in this case is a reservation of resources in eight hoursgiven the provisioning needs and other workload and jobs running on theon-demand center. Thus, jobs running or in the queue at the localenvironment will have an opportunity to tap into the reservation andgiven a variety of parameters, say job number 12 has priority or anopportunity to get a first choice of those reserved resources.

With reference to FIG. 2, an exemplary system for implementing theinvention includes a general purpose computing device 200, including aprocessing unit (CPU) 220, a system memory 230, and a system bus 210that couples various system components including the system memory 230to the processing unit 220. The system bus 210 may be any of severaltypes of bus structures including a memory bus or memory controller, aperipheral bus, and a local bus using any of a variety of busarchitectures. The system may also include other memory such as readonly memory (ROM) 240 and random access memory (RAM) 250. A basicinput/output (BIOS), containing the basic routine that helps to transferinformation between elements within the computing device 200, such asduring start-up, is typically stored in ROM 240. The computing device200 further includes storage means such as a hard disk drive 260, amagnetic disk drive, an optical disk drive, tape drive or the like. Thestorage device 260 is connected to the system bus 210 by a driveinterface. The drives and the associated computer-readable media providenonvolatile storage of computer readable instructions, data structures,program modules and other data for the computing device 200. In thisregard, the various functions associated with the invention that areprimarily set forth as the method embodiment of the invention may bepracticed by using any programming language and programming modules toperform the associated operation within the system or the computeenvironment. Here the compute environment may be a cluster, grid, or anyother type of coordinated commodity resources and may also refer to twoseparate compute environments that are coordinating workload, workflowand so forth such as a local compute environment and an on-demandcompute environment. Any such programming module will preferably beassociated with a resource management or workload manager or othercompute environment management software such as Moab but may also beseparately programmed. The basic components are known to those of skillin the art and appropriate variations are contemplated depending on thetype of device, such as whether the device is a small, handheldcomputing device, a desktop computer, or a computer server.

Although the exemplary environment described herein employs the harddisk, it should be appreciated by those skilled in the art that othertypes of computer readable media which can store data that is accessibleby a computer, such as magnetic cassettes, flash memory cards, digitalvideo disks, memory cartridges, random access memories (RAMs) read onlymemory (ROM), and the like, may also be used in the exemplary operatingenvironment. The system above provides an example server or computingdevice that may be utilized and networked with a cluster, clusters or agrid to manage the resources according to the principles set forthherein. It is also recognized that other hardware configurations may bedeveloped in the future upon which the method may be operable.

Embodiments within the scope of the present invention may also includecomputer-readable media for carrying or having computer-executableinstructions or data structures stored thereon. Such computer-readablemedia can be any available media that can be accessed by a generalpurpose or special purpose computer. By way of example, and notlimitation, such computer-readable media can comprise RAM, ROM, EEPROM,CD-ROM or other optical disk storage, magnetic disk storage or othermagnetic storage devices, or any other medium which can be used to carryor store desired program code means in the form of computer-executableinstructions or data structures. When information is transferred orprovided over a network or another communications connection (eitherhardwired, wireless, or combination thereof) to a computer, the computerproperly views the connection as a computer-readable medium. Thus, anysuch connection is properly termed a computer-readable medium.Combinations of the above should also be included within the scope ofthe computer-readable media.

Computer-executable instructions include, for example, instructions anddata which cause a general purpose computer, special purpose computer,or special purpose processing device to perform a certain function orgroup of functions. Computer-executable instructions also includeprogram modules that are executed by computers in stand-alone or networkenvironments. Generally, program modules include routines, programs,objects, components, and data structures, etc. that perform particulartasks or implement particular abstract data types. Computer-executableinstructions, associated data structures, and program modules representexamples of the program code means for executing steps of the methodsdisclosed herein. The particular sequence of such executableinstructions or associated data structures represents examples ofcorresponding acts for implementing the functions described in suchsteps.

Those of skill in the art will appreciate that other embodiments of theinvention may be practiced in network computing environments with manytypes of computer system configurations, including personal computers,hand-held devices, multi-processor systems, microprocessor-based orprogrammable consumer electronics, network PCs, minicomputers, mainframecomputers, and the like. Embodiments may also be practiced indistributed computing environments where tasks are performed by localand remote processing devices that are linked (either by hardwiredlinks, wireless links, or by a combination thereof) through acommunications network. As can also be appreciated, the computeenvironment itself, being managed according to the principles of theinvention, may be an embodiment of the invention. Thus, separateembodiments may include an on-demand compute environment, a localcompute environment, both of these environments together as a moregeneral compute environment, and so forth. In a distributed computingenvironment, program modules may be located in both local and remotememory storage devices. Accordingly, the scope of the claims should begoverned by the claims and their equivalents below rather than by anyparticular example in the specification.

1-18. (canceled)
 19. A method of managing a first compute environment,the method comprising: receiving, at a first software module operatingin the first compute environment, from a second software moduleassociated with an external compute environment separate from the firstcompute environment, a specification of resources to be used as part ofprocessing of a workload, the specification of resources comprising both(i) first data explicitly specifying resources, the specified resourceswhich must be provided by the first compute environment when servicingthe workload, and (ii) second data specifying one or more flexibleresources, the flexible resources which can be dynamically modified byat least the first compute environment; creating a new software modulewithin the first compute environment; provisioning at least one node inthe first compute environment to perform at least part of the processingof the workload, wherein the new software module is configured to managethe provisioned at least one node and communicate with the secondsoftware module; and removing the new software module after one or moreremoval criteria have been met.
 20. The method of claim 19, wherein: thenew software module is exclusively associated with the workload; and theone or more removal criteria comprise completion of the at least part ofthe processing of the workload.
 21. The method of claim 19, wherein thesecond data specifying the one or more flexible resources is based atleast in part on one or more attributes of the external computeenvironment.
 22. The method of claim 19, wherein the provisioning of theat least one node is based at least in part a computerized analysis ofat least part of the workload.
 23. The method of claim 19, wherein thespecification of resources specifies an allocation of time, and whereinthe method further comprises automatically adjusting a duration of theallocation of time.
 24. The method of claim 23, wherein theautomatically adjusting a duration of the allocation of time comprisesautomatically adjusting based at least on at least one of (i)accommodating provisioning overhead, or (ii) one or more clean upprocesses.
 25. The method of claim 19, wherein: the external computeenvironment comprises a local compute environment, and the first computeenvironment comprises an on-demand compute environment that isseparately managed from the local compute environment and configured tobe utilized when the local compute environment has insufficientresources to process workload submitted to the local computerenvironment; and the on-demand compute environment and the local computeenvironment are in data communication with one another via at least onewireless or wireline communication network.
 26. The method of claim 19,where in the provisioning further comprises creating at least onevirtual cluster within the first compute environment, the creation ofthe at least one virtual cluster comprising dynamic selection of aplurality of compute resources, and provisioning of the dynamicallyselected compute resources for use by at least part of the workload. 27.The method of claim 19, wherein the provisioning comprises, responsiveto the receiving the specification of resources, causing (i) dynamicco-allocation of a plurality of compute resources, and (ii) provisioningof the co-allocated compute resources for use by at least part of theworkload.
 28. The method of claim 19, wherein the creating a newsoftware in the first compute environment comprises creating one or morenew software instances on one or more respective ones of a plurality ofcompute nodes within the first compute environment.
 29. The method ofclaim 19, wherein: the receiving the specification of resources is inpreparation for an assumption of at least a portion of the workload bythe first compute environment from the external compute environment; andthe second data specifying one or more flexible resources comprises datarelating to a network storage device requirement.
 30. The method ofclaim 19, wherein: the receiving the specification of resources is inpreparation for an assumption of at least a portion of the workload bythe first compute environment from the external compute environment; andthe second data specifying one or more flexible resources comprises datarelating to at least one of a operating system (OS) or installedsoftware application.
 31. The method of claim 19, wherein theprovisioning the at least one node in the first compute environment toperform at least part of the processing of the workload comprises usingat least the first data and the second data.
 32. The method of claim 31,wherein at least one of the first data or the second data is based atleast in part on one or more queue attributes.
 33. The method of claim19, further comprising (i) identifying one or more resource constraintsassociated with processing the workload within the first computeenvironment, and (ii) causing dynamic modification of at least a portionof the first compute environment based at least on the identified one ormore resource constraints.
 34. The method of claim 33, wherein: theidentifying the one or more resource constraints comprises identifyingone or more resource provisioning requirements necessary for processingof the workload; and the provisioning of the at least one node comprisesprovisioning based at least on the one or more resource provisioningrequirements.
 35. The method of claim 19, wherein the receiving thespecification of resources to process the workload comprises receivingas specification of resources that is configured based at least on: (i)notification of a user submitting the workload for processing thatadditional compute resources are required to process the workload, and(ii) affirmative action by the notified user acknowledging or requestingthe additional compute resources.
 36. The method of claim 19, whereinthe receiving the specification of resources to process the workloadcomprises receiving as specification of resources that (i) is createdautomatically such that it requires no user intervention, and (ii) isconfigured based at least on determination that compute resources inaddition to those of the external compute environment are required toprocess the workload.
 37. The method of claim 19, wherein: the firstcompute environment comprises one or more untrusted resources; and themethod further comprises receiving data from the external computeenvironment relating to one or more security models, the one or moresecurity models configured to maintain a level of security even with thepresence of the one or more untrusted resources within the computeenvironment.
 38. A computerized apparatus configured for communicationwith a first compute environment, the computerized apparatus associatedwith an external compute environment separate from the first computeenvironment, the computerized apparatus comprising: a processor; and astorage device in data communication with the processor, the storagedevice comprising a non-transitory computer-readable storage mediumhaving stored therein instructions which, when executed by theprocessor, cause the computerized apparatus to: receive data relating toat least one rigid resource requirement for processing workload; causecreation of a specification of resources to process the workload, thespecification of resources comprising both (i) the data relating to theat least one rigid resource requirement, and (ii) data relating to atleast one flexible resource requirement; cause transmission of thespecification of resources from a first software module of thecomputerized apparatus to a second software module operating on at leastone node in the first compute environment, the specification ofresources requiring creation of a third software module on at least onenode in the first compute environment, the third software moduleassociated uniquely with the specification of resources and configuredto manage at least one compute resource in the first compute environmentto process at least part of the workload.
 39. The computerized apparatusof claim 38, wherein the third software module is configured for removalfrom the first compute environment based at least on completion ofprocessing of the workload.
 40. The computerized apparatus of claim 39,wherein the third software module being uniquely associated with thespecification of resources enables the removal of the third softwaremodule after the completion of processing of the workload.
 41. Thecomputerized apparatus of claim 38, wherein the at least one flexibleresource requirement is based at least on one or more attributes of theexternal compute environment, the one or more attributes selected fromthe group consisting of: (i) operating system configuration, and (ii)one or more installed software applications.
 42. The computerizedapparatus of claim 38, wherein the at least one rigid resourcerequirement is selected from the group consisting of: (i) processorarchitecture, and (ii) RAM (random access memory) configuration.
 43. Thecomputerized apparatus of claim 38, wherein the external computeenvironment comprises a compute environment having a plurality ofcommonly managed compute resources, and the first compute environmentcomprises an on-demand compute environment that is separately managedfrom the local compute environment but in data communication with theexternal compute environment via a data network.
 44. The computerizedapparatus of claim 38, wherein the specification of resources isconfigured to enable creation of at least one virtual cluster within thefirst compute environment, the creation of the at least one virtualcluster comprising dynamic selection of a plurality of computeresources, and provisioning of the selected compute resources for use byat least part of the workload as part of the processing thereof.
 45. Thecomputerized apparatus of claim 38, wherein at least one of the secondsoftware module or the third software module are configured todynamically modify the at least one compute resource in the firstcompute environment based at least on the data relating to the flexibleresource requirement.
 46. The computerized apparatus of claim 38,wherein the instructions are further configured to, when executed by theprocessor, cause the computerized apparatus to (i) identify one or moreresource constraints associated with processing the workload within thefirst compute environment, and (ii) cause dynamic modification of atleast a portion of the first compute environment based at least on a)the identified one or more resource constraints and b) the data relatingto the at least one flexible resource requirement.
 47. The computerizedapparatus of claim 46, wherein the identification of the one or moreresource constraints comprises identification of one or more resourceprovisioning requirements necessary for processing of the workloadeither a) by or at a prescribed time, or b) within a prescribed periodor duration of time.
 48. The computerized apparatus of claim 38, whereinthe receipt of the data relating to the at least one rigid resourcerequirement comprises receipt via a computerized job request to theexternal compute environment, the received data requiring at least oneof computer processor or memory requirements.
 49. The computerizedapparatus of claim 38, wherein the computerized apparatus comprises asecurity model configured to account for addition of one or moreuntrusted resources to at least one of the external compute environmentor the first compute environment.
 50. The computerized apparatus ofclaim 38, wherein the causation of creation of the specification ofresources to process the workload is based at least in part on exceedingone or more trigger criteria or events associated with the externalcompute environment.
 51. The computerized apparatus of claim 50, whereinthe one or more trigger criteria or events associated with the externalcompute environment comprise at least one of a) a resource failurewithin the external compute environment; or b) exceeding a resourceconsumption threshold associated with the external compute environment.52. Storage apparatus comprising a non-transitory computer-readablestorage medium having stored therein instructions which, when executedby a processor apparatus of a computerized apparatus within a firstcompute environment, cause the computerized apparatus to performoperations comprising: receive data relating to at least one workload tobe processed; based at least on an evaluation of the received data,cause creation of a specification of resources to process at least aportion of the at least one workload, the specification of resourcescomprising both (i) first data relating to the at least one explicitresource requirement, and (ii) second data relating to at least onesecond resource requirement, the at least one second resourcerequirement based at least on one or more attributes of the firstcompute environment; cause transmission of the specification ofresources from a first module of the computerized apparatus to a secondmodule operating in a second compute environment, the specification ofresources configured to cause creation of a new software instance in thesecond compute environment, the new software instance a) particularlyassociated with the at least one workload, and b) configured tocommunicate with at least one compute resource in the second computeenvironment to process the at least one workload.
 53. The storageapparatus of claim 52, wherein the instructions are further configuredto, when executed, receive data communications at the first softwaremodule from the second compute environment indicating that the newsoftware instance has been removed from the second compute environmentafter the at least one workload has been processed.
 54. A method ofoperating a first compute environment to cause utilization of computeresources within a second, separately managed compute environment, themethod comprising: receiving at least one first data structure at thefirst compute environment, the at least one data structure comprising arequest to process workload, the request comprising at least oneexplicit resource requirement; creating at least one second datastructure, the at least one second data structure comprising both (i)data relating to the at least one explicit resource requirement, and(ii) data relating to at least one additional resource requirement, theat least one additional resource requirement based at least on at leastone of a network attribute or operating system attribute of the firstcompute environment; causing transmission, from a first software moduleof the first compute environment to a second software module operatingin the second compute environment, the at least one second datastructure, the transmission configured to cause creation of a newsoftware instance in the second compute environment, the new softwareinstance configured to communicate with at least one compute node in thesecond compute environment to effect processing of at least a portion ofthe workload; and receiving at least one data communication at the firstcompute environment from the second compute environment, the at leastone data communication indicating that the new software instance hasbeen removed from the second compute environment pursuant to completionof processing of the at least portion of the workload.